The Dynamics Between Socioeconomic Connectedness, Cohesiveness, and Civic Engagement

This blog is the final project blog for the DH140 course.
Author

Mia Sadowski

Published

August 4, 2023

Modified

August 4, 2023

Introduction to The Project

People establish different social connections and become a part of different cultures depending on their socioeconomic background. The Social Capital Atlas, the foundational dataset of this project, aims to analyze this exact instance. This dataset comes from Data for Good at Meta, who is renowned for their ownership of Facebook and Instagram, and is organized into separated .csv files to analyze various US counties, zip codes, colleges, and high schools to quantify three main measurements. Firstly, the economic connectedness measures the number of shared friends between people of different or similar socioeconomic statuses, depending on the category analyzed. The second measure is the cohesiveness or how much friends tend to be supported by other’s mutual friends. Finally, the civic engagement is quantified by the measurement of how often people participate in volunteer activities publicly on social media.

Motivation

Growing up, I noticed economic disparsity within my own hometown and how that played with relationship circles. I grew up in a sub-urban beach town called Ventura, about an hour away from Los Angeles county. It was generally a quiet town; however, I noticed some economic and racial disparsity, with certain communities being separated from the rest. I especially noticed this in friend groups within public high schools. Those who were more wealthy and lived in more expensive areas typically congregated together, while those who lived in less expensive areas were excluded from these groups and instead formed social circles around themselves. To my knowledge, most of the causes for exclusion was not from direct bullying but merely a difference in where people lived and general community standards.

While I am unable to explore racial disparity in this dataset, I am able to see if this economic disparity affects friend groups through the country through this dataset. It will also be interesting to recognize whether higher rates of inclusion lead to higher rates of volunteer efforts, as it may assume a set of values within a community that truly sets it apart, or if it is all randomized.

Research Questions

To learn more about economic disparity and how various social circles affects societal impact and the public good, I decided to analyze the following research questions.

  1. Among an adult population, how does the connectedness between individuals with varying socioeconomic statuses relate to the cohesiveness of friend groups?

  2. To what extent does higher social cohesion in a society influence rates of civic engagement?

Methods

This section will explain our data and analytical process to address our research question.

Summary Information

The Social Capital Atlas is organized through only qualitative data gathered from users on Facebook. This survey utilizes publicly available information from Facebook, to which users have granted access, encompassing data from nearly every county in the United States. While there are different spreadsheets for ZIP codes, colleges, and high schools throughout the country, this research project will analyze counties specifically for a more focused narrative. Only data of users aged between 25-44, were on Facebook at least once in the prior 30 days, have at least 100 U.S. based Facebook friends, and having a non-missing ZIP codes were included in this report. The report ensures that they use privacy protection to ensure that personal data about the individuals cannot be learned from the dataset.

import pandas as pd
import matplotlib.pyplot as plt
import folium

df = pd.read_csv('https://data.humdata.org/dataset/85ee8e10-0c66-4635-b997-79b6fad44c71/resource/ec896b64-c922-4737-b759-e4bd7f73b8cc/download/social_capital_county.csv')

Nearly every category measures either the connectedness, cohesiveness, or civic engagement in some form. While there are many columns, many of them will not be used for the final analysis since they are irrelevant to the research questions. Here is a summary of what columns this project will utilize, as well as explanation for why this project will not utilize the others.

ec_county - This measures the level of economic connectedness within a county. The equation for this is two times the share of high-SES (social economic status) friends amongst low-SES individuals and then averaged. This equation is from the research journal “Social capital I: Measurement and Associations with Economic Mobility.”
ec_high_county - This measures the connectedness between high-SES individuals solely, once again with a value that is averaged. The main difference is that this does not include how connected they are with people from lower economic statuses.
exposure_grp_mem_county - This measures how often high-SES individuals are exposed to low-SES individuals with the same formula of from ec_county.
clustering_county - This calculates the average fraction of an individual’s friend pairs who are also friends with each other including people only within the relevant county.
volunteering_rate_county - The percentage of Facebook users within a county that are predicted to be members of a volunteering or activism group. Secret groups or large groups that Facebook identified as “clearly misclassified” were not included in this calculation.

There are many other columns. One of the main categories include those that revolve around childhood connectedness specifically. These are not included in this research report because the research question aims to address the adult population. Additionally, the two categories, the support_ratio_county, which measures the proportion of people who share a third mutual friend, and so was civic_organizations_county, which measures the number of Facebook Pages predicted to be “Public Good” within a county, are not included as they would not address the research questions effectively.

columns_removed = ['num_below_p50', 'pop2018', 'child_ec_county', 'child_ec_se_county', 'ec_grp_mem_county', 'child_high_ec_county', 'child_high_ec_se_county', 'ec_grp_mem_high_county','child_exposure_county','child_high_exposure_county','bias_grp_mem_county','bias_grp_mem_high_county','child_bias_county','child_high_bias_county', 'support_ratio_county', 'civic_organizations_county']

df = df.drop(columns=columns_removed)

Summary Statistics

This is the following cell range for each column:
ec_county - 0.29469 - 1.3597. The lower the number is, the less connectedness there is between high-SES and low-SES individuals. The standard deviation ranges from around 0.004-0.05 per county.
ec_high_county - 0.70062 - 1.71507. The lower the number is, the less connectedness there is between high-SES individuals. The standard deviation for this ranges from 0.004-0.05 per county, similar to the original ec_county statistic.
exposure_grp_mem_county - 0.2552-1.48628. The lower the number is, the less exposure people of high-SES to low-SES individuals.
exposure_grp_mem_high_county - 0.51013-1.66616. The lower the number is, the less exposure high-SES people have to other high-SES individuals.
clustering_county - 0.07162-0.26097. The lower the number equates to a lower amount of mutual friend circles within the county.
volunteering_rate_county - 0.00965 - 0.308736. The lower the number is, the less volunteers the county has.

We will also note the mean (average) values for future reference.

mean_values = df.mean(numeric_only=True)
print(mean_values)
county                          30218.783101
ec_county                           0.814464
ec_se_county                        0.013409
ec_high_county                      1.252636
ec_high_se_county                   0.014754
exposure_grp_mem_county             0.906089
exposure_grp_mem_high_county        1.078581
clustering_county                   0.116456
volunteering_rate_county            0.078068
dtype: float64

Analytical Process

For this dataset that is entirely numerical and based on location, the best visualization methods to explore this dataset will be through scatterplots, histograms, and heat maps. These also happen to be the best type of visualizations to do further analysis, however, we will look specifically into the following questions before investigating further.

    1) Is there any relationship between economic connectedness between low and high SES and just high SES? If so, what’s the pattern? Do those categories seem to have a balanced distribution, or are there more counties that are on the lower or higher end of the spectrum?
    2) What do the levels of clustering_county, or fractions of friend pairs, look like around the country? Is there any clear average?
    3) What do the levels of volunteering rates look like around the country? Is there any clear average?

After we have a better understanding of what this looks like, we will then investigate more into the research questions. This will involve creating visualizations that explore more of the direct connection between these three factors of connectedness, cohesiveness, and civic engagement, rather than focusing on each factor individually as we did in the exploration stage. Then, we can move onto our discussion and analyze how this connects with the research question.

Results

Following the initial assessment of the data, it is likely that connectedness does not correlate to cohesiveness. Cohesiveness around the country seems to be heavily skewed, while connectedness is more balanced. It is also worth investigating whether exposure to low SES and high SES individuals increases connectedness. Such a finding would suggest that connectedness is primarily influenced by exposure rather than being solely a matter of cultural emphasis.

However, higher social cohesion does likely lead to higher rates of civic engagement. This is because it builds more of a community, and as we seen in areas like Alaska, both rates of cohesion and civic engagement tend to be higher.

To begin, let’s first see if there’s any relationship between exposure and connectedness.

def extract_county_name(name): ## couldn't find a .geojson that had the counties lists as (County Name, State Name)
    parts = name.split(',')
    if len(parts) > 1:
        return parts[0].strip()  
    else:
        return name
    
df['county_name'] = df['county_name'].apply(extract_county_name)
county_geo_url = "https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_20m.json"
import requests
response = requests.get(county_geo_url)
county_geo_data = response.json()
# Because maps will be utilized, cleaning the .geojson to match the dataset names.
geojson_names = set(feature['properties']['NAME'] for feature in county_geo_data['features'])
df_names = set(df['county_name'])
missing_names = df_names - geojson_names

name_mapping = {
    "0500000US51161": {"NAME": "Roanoke County"},
    "0500000US51770": {"NAME": "Roanoke City"},
    "0500000US24510": {"NAME": "Baltimore City"},
    "0500000US29189": {"NAME": "St. Louis County"},
    "0500000US51059": {"NAME": "Fairfax County"},
    "0500000US24005": {"NAME": "Baltimore County"},
    "0500000US51019": {"NAME": "Bedford County"},
    "0500000US35013": {"NAME": "DoÃ\x83±a Ana"},
    "0500000US51159": {"NAME": "Richmond County"},
    "0500000US51760": {"NAME": "Richmond City"},
    "0500000US29510": {"NAME": "St. Louis City"},
    "0500000US29186": {"NAME": "Ste Genevieve"},
    "0500000US51067": {"NAME": "Franklin County"},
}

def update_geojson_feature_properties(feature, properties):
    feature['properties'].update(properties)

for feature in county_geo_data['features']:
    census_area = feature['properties'].get('GEO_ID', '')
    if census_area in name_mapping:
        update_geojson_feature_properties(feature, name_mapping[census_area])

geojson_names = set(feature['properties']['NAME'] for feature in county_geo_data['features'])
df_names = set(df['county_name'])
missing_names = df_names - geojson_names

Data Visualization #1

df.plot.scatter(x='ec_county', y='exposure_grp_mem_county', color='royalblue')
plt.xlabel("Economic Connectedness between low and high SES")
plt.ylabel("Exposure of Population to Low and High SES")
plt.title("Relationship between Exposure to Connection between low and high SES")
plt.show()

There is a strong linear correlation between exposure and connectedness. Counties that experience greater exposure to diverse socioeconomic groups tend to exhibit higher levels of connectedness. There are some outliers that mostly consist of points that have higher exposure but lower connectedness than usual. This finding is a key finding as it suggests connectedness is predominantly influenced by the extent of exposure, rather than being rooted primarily in a county’s culture influences. Otherwise, there would be much less of a linear correlation and more frequent outliers.

This finding further decreases the likelihood of a correlation between cohesiveness and connectedness. Cohesiveness is the degree of a social network’s cliques and the integration of mutual friends, which is a decision that people make by choice. Exposure is not a choice, people are naturally prone to it simply by moving into a certain county. These distinct motivational factors make it less probable for a causal relationship to exist between the two.

Data Visualization #2

df.plot.scatter(x='ec_county', y='clustering_county', color='royalblue')
plt.xlabel("Economic Connectedness between low and high SES")
plt.ylabel("Level of Cohesiveness")
plt.title("Relationship between relationship connectedness between SES and overall cohesiveness in a county")
plt.show()

Confirming our initial assumption, the graph validates that no linear correlation exists, and it reveals the presence of numerous outliers. Many outliers are on the lower than average side of connectedness, but higher than average levels of cohesion or the converse. A significant proportion of these cases demonstrate lower cohesiveness levels along the economic connectedness spectrum. This suggests that there is little relation; however, certain counties with cultures of economic hierarchy may be higher. We will identify whether this statement could be valid by conducting another scatterplot that measures the connectedness exclusively between those of high SES instead.

Data Visualization #3

df.plot.scatter(x='ec_high_county', y='clustering_county', color='forestgreen')
plt.xlabel("Economic Connectedness of those exclusively high SES")
plt.ylabel("Level of Cohesiveness")
plt.title("Relationship between Relationship Connectedness between Exclusively High SES and Overall Cohesiveness in a county")
plt.show()

This chart looks quite similar to the previous one. While it does start to establish a slightly more linear trend than above, it is not significant enough to say there is any correlation. Additionally, many of the outliers have the same pattern of having low connectedness but high cohesion and conversely, so it disproves the economic hierachy theory.

With this, the economic hierarchy theory loses support. Our focus now turns to the analysis of the relationship between cohesiveness and civic engagement.

Data Visualization #4

df.plot.scatter(x='clustering_county', y='volunteering_rate_county', color='slategray')
plt.xlabel("Clusters Within a County")
plt.ylabel("Volunteering Rate Within a County")
plt.title("Relationship between Cohesiveness and Civic Engagement")
plt.show()

Unlike the argument proposed earlier, this scatterplot displays the weakest correlation among all the examined variables in this report. There is no linear pattern present and most of the points are near the lower left quadrant of the graph. This would mean that increased cohesiveness does not increase civic engagement by any means. In fact, many of the outliers seem to be on opposite ends of the spectrum, meaning that cohesiveness is high but civic engagement is low, or civic engagement is low but cohesiveness is high.

We will do a final analysis of this point in a heatmap to potentially uncover clusters or trends that warrant further scrutiny and discussion for this point.

Data Visualization #5

m = folium.Map(location=[36, -98], zoom_start=3)

title_html = '''
<h1 align="center" style="font-size:18px"><b>Correlation between Civic Engagement and Cohesiveness</b></h1>
<p align="center">Utilize the legend near the top right to toggle between different layers.</p>
'''
m.get_root().html.add_child(folium.Element(title_html))

folium.Choropleth(
    geo_data=county_geo_url,
    data=df,
    columns=['county_name', 'clustering_county'],
    key_on='feature.properties.NAME',
    fill_color='YlOrRd',
    name='Cohesiveness Levels', 
).add_to(m)

folium.Choropleth(
    geo_data=county_geo_url,
    data=df,
    columns=['county_name', 'volunteering_rate_county'],
    key_on='feature.properties.NAME',
    fill_color='YlOrRd',
    name='Civic Engagement Levels', 
).add_to(m)



folium.LayerControl().add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

With the exception of Alaska, most areas seems to share a general pattern of having low cohesiveness and volunter rates, overall having no behavior that particularly stands out. While there are some counties that stand out for having an abnormally high civic engagement rate compared to its surrounding area, such as Niobrara County in Wyoming, these counties do not have additional research or information about them that could be researched for the discussion.

Discussion

Key Findings

The research question aimed to address connections between connectedness and cohesiveness along with the relationship between cohesiveness and civic engagement. According to the Social Capital dataset and the generated data visualizations in this report, there is no direct correlation between any of these categories. Cohesiveness remains generally low even as connectedness increases. Both civic engagement and cohesiveness tend to stay low for most counties with the exception of some outliers. Neither of them particularly cause one or the other to change.

Interpretation

The outcome of the analysis did not meet the initial expectations for certain points. While it was expected that connectedness and cohesiveness would not be correlated due to their distinct motivational drivers, it was expected that cohesiveness and civic engagement would have a connection as they are voluntarily performed actions.

One plausible reason that might explain the connectedness and cohesiveness is that race might play a more significant role in this scenario. In a intercultural research study conducted in the Netherlands, researchers found that wealthy Dutch individuals had a stronger preference for friendships that had cultural similarities; therefore, they had fewer interethnic friendships. Higher SES non-Western minority members on the other hand, craved more interethnic friendships.⁴ This resonates with observations within the United States, as there is often a racial divide for socioeconomic classes within neighborhoods. An example of this is in Ventura as mentioned in the beginning of the report. A research report identified that a greater percentage of minorities, such as Hispanic/Latino, African American, and Asian members, lived in areas where there was toxicity-weighted pesticide use.⁵

Regarding cohesiveness and civic engagement, the absence of a direct relationship could stem from cohesiveness primarily measuring the existence and inclusion within social cliques, without delving into the depth of relationships or social dynamics. When investigating civic engagement, researchers found that when people felt a stronger connection and relationship to their neighborhood, they were more likely to desire wanting to participate in volunteer work.⁶ Cohesiveness does not include only strong relationships, if it did, this would likely result in different results.

An intriguing exception was Alaska as it was the only region where there was stronger rates in cohesiveness and civic engagement throughout a majority of its counties. The Foraker Group, a non-profit organization in Anchorage, Alaska proudly presents that it is 5th in the nation for volunteerism.⁷ Considering they have the third to least population in the United States, this is significant.⁸ Nearly half of Alaskan residents donate to charity, and they estimate about a third of adults serve as board members for a nonprofit. While it is harder to find secondary literature that connects their cohesiveness, this explains why their civic engagement rates are particularly higher than most other states. Considering a lot of their population participates, there might be a higher emphasis on community within the state’s culture, hence the higher rates.

Implication

Considering these interpretations, there are a few implications for societal understanding. For starters, research articles have previously recognized socioeconomic segregation in friendship networks, romantic partners, neighborhoods, education, and workplaces.⁹ Yet, through this study, it is suggested that the divide depends on other factors such as race and cultural differences. This is crucial when addressing the incessant wealth gap within American society.

Additionally, to increase civic engagement in society, the fostering of a strong community is needed. Since there is no correlation between cohesiveness and civic engagement, a strong community is not one in which many people are connected and involved, rather, it is one that can establish a powerful identity and inspire others to become a part of it. A common phenmonen and experience in many people’s lives is to meet and network with as many people as one can, but this report proves that not might always be the case for personal growth and might be more superficial.

Limitation

This research report was severely limited by the fact that this dataset was acquired through Facebook data. Relying on Facebook data has its constraints, as it may not fully capture the strength of relationships or accurately represent offline engagement. Additionally, civil engagement is measured by the existence of Facebook groups, while it is likely that many users do not establish their membership of volunteer groups online. Many users might also befriend users that they know of but are not quite connected to on a platform like Facebook simply because they like the numbers. Hence, reports may be skewed for results of civic engagement and connectedness. Combining self-reported data with Facebook-derived data could offer a more comprehensive perspective.

Conclusion

This study establishes that there is no direct correlation simply between SES connectedness and cohesiveness nor a correlation between cohesiveness and civil engagement.

Future studies could explore the impact of close relationships onto volunteer rates versus relationships of acquitances. Investigating other factors among the counties, such as racial, cultural, and generational diversity, could enrich the understanding of the subject and its intertwining dynamics.

While this research study established no correlation, it lays the foundation for more focused and specialized research upon the impacts of social capital.

Citations

  1. Chetty, Raj, Matthew O. Jackson, Theresa Kuchler, Johannes Stroebel, Nathaniel Hendren, Robert B. Fluegge, Sara Gong, et al. “Social Capital I: Measurement and Associations with Economic Mobility.” Nature 608, no. 7921 (August 4, 2022): 108–21. https://doi.org/10.1038/s41586-022-04996-4.
  2. “Population of Counties in Alaska 2023.” World Population Review. https://worldpopulationreview.com/states/alaska/counties.
  3. “U.S. Census Bureau QuickFacts: Nome Census Area, Alaska.” United States Census Bureau, n.d. https://www.census.gov/quickfacts/fact/table/nomecensusareaalaska/PST045222.
  4. “U.S. Census Bureau QuickFacts: McMullen County, Texas.” United States Census Bureau, n.d. https://www.census.gov/quickfacts/fact/table/mcmullencountytexas/PST045222.
  5. Damen, Roxy Elisabeth Christina, Borja Martinović, and Tobias H. Stark. “Explaining the Relationship between Socio-Economic Status and Interethnic Friendships: The Mediating Role of Preferences, Opportunities, and Third Parties.” International Journal of Intercultural Relations 80 (January 2021): 40–50. https://doi.org/10.1016/j.ijintrel.2020.11.005.
  6. Temkin, Alexis M., Uloma Igara Uche, Sydney Evans, Kayla M. Anderson, Sean Perrone-Gray, Chris Campbell, and Olga V. Naidenko. “Racial and Social Disparities in Ventura County, California Related to Agricultural Pesticide Applications and Toxicity.” The Science of the Total Environment 853 (December 20, 2022): 158399. https://doi.org/10.1016/j.scitotenv.2022.158399.
  7. Dang, Lisa, Ann‐Kathrin Seemann, Jörg Lindenmeier, and Iris Saliterer. “Explaining Civic Engagement: The Role of Neighborhood Ties, Place Attachment, and Civic Responsibility.” Journal of Community Psychology 50, no. 3 (April 2022): 1736–55. https://doi.org/10.1002/jcop.22751.
  8. The Foraker Group. “Volunteers in Alaska: Understanding the Data,” n.d. https://www.forakergroup.org/volunteers-in-alaska-understanding-the-data/.
  9. “What State Has the Lowest Population? The Top 10 Least-Populated States in the US.” USA TODAY, July 12, 2023. https://www.usatoday.com/story/news/2023/01/02/what-state-has-lowest-population-us-states-ranked-population/10476960002/.
  10. Mijs, Jonathan J. B., and Elizabeth L. Roe. “Is America Coming Apart? Socioeconomic Segregation in Neighborhoods, Schools, Workplaces, and Social Networks, 1970–2020.” Sociology Compass 15, no. 6 (June 2021). https://doi.org/10.1111/soc4.12884.